Totally data-driven intonation prediction model using a novel F0 contour parametric representation

نویسندگان

  • Lifu Yi
  • Jian Li
  • Xiaoyan Lou
  • Jie Hao
چکیده

This paper proposes a novel parametric representation of mandarin intonation based on orthogonal polynomial approximation. The polynomial is a simplified representation of Parallel Encoding and Target Approximation (PENTA) intonation model that includes a target component and an approximation component. We also propose predicting the polynomial parameters from linguistic and phonetic attributes by generalized linear models (GLM). The optimal attributes are automatically selected by stepwise regression method. Thus both model structures and model coefficients are optimized in a totally data-driven manner. In addition, speaking rate is introduced as a new attribute for prediction. When the method is applied to intonation prediction of Mandarin speech, it achieves F0 RMSE of 30.21 Hz and correlation coefficients of 0.85 in open test. Informal perceptual experiments show that the predicted intonation is quite appropriate and natural.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

A Unified Totally-Data-Driven Framework for Duration and Intonation Modeling

This paper proposes a unified framework for duration and intonation modeling in Mandarin TTS. In this framework, we design a novel parametric representation of mandarin intonation based on orthogonal polynomial approximation. By this representation, we can decompose F0 vector into 3 orthogonal polynomial parameters that are continuous scalars. Based on this vector-to-scalar decomposition, we ca...

متن کامل

The Copasul Intonation Model

A new data-driven and linguistically interpretable intonation model for the automatic analysis and synthesis of fundamental frequency contours is introduced: the CoPaSul model, which provides a contour-based (Co), parametric (Pa), and superpositional (Sul) intonation representation. Its application in F0 analysis and generation is described as well as its linguistic anchoring with respect to se...

متن کامل

Automatisation of intonation modelling and its linguistic anchoring

This paper presents a fully machine-driven approach for intonation description and its linguistic interpretation. For this purpose, a new intonation model for bottom-up F0 contour analysis and synthesis is introduced, the CoPaSul model which is designed in the tradition of parametric, contour-based, and superpositional approaches. Intonation is represented by a superposition of global and local...

متن کامل

A Template-Based Approach for Speech Synthesis Intonation Generation Using LSTMs

The absence of convincing intonation makes current parametric speech synthesis systems sound dull and lifeless, even when trained on expressive speech data. Typically, these systems use regression techniques to predict the fundamental frequency (F0) frame-by-frame. This approach leads to overly-smooth pitch contours and fails to construct an appropriate prosodic structure across the full uttera...

متن کامل

Generation of F0 contours using a model-constrained data-driven method

This paper introduces a novel model-constrained, data-driven method for generating fundamental frequency contours in Japanese text-to-speech synthesis. In the training phase, the parameters of a command-response F0 contour generation model are learned by a prediction module, which can be a neural network or a set of binary regression trees. The input features consist of linguistic information r...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2006